Sparse Modeling for Artist Identification: Exploiting Phase Information and Vocal Separation

نویسندگان

  • Li Su
  • Yi-Hsuan Yang
چکیده

As artist identification deals with the vocal part of music, techniques such as vocal sound separation and speech feature extraction has been found relevant. In this paper, we argue that the phase information, which is usually overlooked in the literature, is also informative in modeling the voice timbre of a singer, given the necessary processing techniques. Specifically, instead of directly using the raw phase spectrum as features, we show that significantly better performance can be obtained by learning sparse features from the negative derivative of phase with respect to frequency (i.e., group delay function) using unsupervised feature learning algorithms. Moreover, better performance is achieved by using singing voice separation as a pre-processing step, and then learning features from both the magnitude spectrum and the group delay function. The proposed system achieves 66% accuracy in identifying 20 artists from the artist20 dataset, which is better than a prior art by 7%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting joint sparsity in compressed sensing-based RFID

We propose a novel scheme to improve compressed sensing (CS)-based radio frequency identification (RFID) by exploiting multiple measurement vectors. Multiple measurement vectors are obtained by employingmultiple receive antennas at the reader or by separation into real and imaginary parts. Our problem formulation renders the corresponding signal vectors jointly sparse, which in turn enables the...

متن کامل

Nonnegative Tensor Factorization with Frequency Modulation Cues for Blind Audio Source Separation

We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency modulation as cues in the separation. This permi...

متن کامل

Singing Voice Separation from Monaural Recordings

Separating singing voice from music accompaniment has wide applications in areas such as automatic lyrics recognition and alignment, singer identification, and music information retrieval. Compared to the extensive studies of speech separation, singing voice separation has been little explored. We propose a system to separate singing voice from music accompaniment from monaural recordings. The ...

متن کامل

Computational methods for underdetermined convolutive speech localization and separation via model-based sparse component analysis

In this paper, the problem of speech source localization and separation from recordings of convolutive underdetermined mixtures is studied. The problem is cast as recovering the spatio-spectral speech information embedded in a microphone array compressed measurements of the acoustic field. A model-based sparse component analysis framework is formulated for sparse reconstruction of the speech sp...

متن کامل

Audio Source Separation Using Hierarchical Phase-Invariant Models

Audio source separation consists of analyzing a given audio recording so as to estimate the signal produced by each sound source for listening or information retrieval purposes. In the last five years, algorithms based on hierarchical phase-invariant models such as singleor multichannel hidden Markov models (HMMs) or nonnegative matrix factorization (NMF) have become popular. In this paper, we ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013